Data Balancing for Technologically Assisted Reviews: Undersampling or Reweighting
نویسندگان
چکیده
This paper provides approaches for automated support of citation screening in systematic reviews. Continuous active learning is chosen as our baseline approach, above which, two data balancing techniques are applied to handle the imbalance problem. These two techniques, aggressive undersampling and reweighting are tested and compared on 20 data sets for Diagnostic Test Accuracy (DTA) reviews. Results are evaluated by last rel and suggest that reweighting outperforms undersampling as it not only balances the training data, but also emphasizes the “content relevant” examples over “abstract relevant” ones and thus helps to retrieve “content relevant” papers earlier.
منابع مشابه
LIMSI@CLEF eHealth 2017 Task 2: Logistic Regression for Automatic Article Ranking
This paper describes the participation of the LIMSI-MIROR team at CLEF eHealth 2017, task 2. The task addresses the automatic ranking of articles in order to assist with the screening process of Diagnostic Test Accuracy (DTA) Systematic Reviews. We used a logistic regression classifier and handled class imbalance using a combination of class reweighting and undersampling. We also experimented w...
متن کاملEvolutionary Undersampling for Classification with Imbalanced Datasets: Proposals and Taxonomy
Learning with imbalanced data is one of the recent challenges in machine learning. Various solutions have been proposed in order to find a treatment for this problem, such as modifying methods or the application of a preprocessing stage. Within the preprocessing focused on balancing data, two tendencies exist: reduce the set of examples (undersampling) or replicate minority class examples (over...
متن کاملShearlet-based compressed sensing for fast 3D cardiac MR imaging using iterative reweighting
High-resolution three-dimensional (3D) cardiovascular magnetic resonance (CMR) is a valuable medical imaging technique, but its widespread application in clinical practice is hampered by long acquisition times. Here we present a novel compressed sensing (CS) reconstruction approach using shearlets as a sparsifying transform allowing for fast 3D CMR (3DShearCS). Shearlets are mathematically opti...
متن کاملebalance: A Stata Package for Entropy Balancing
The Stata package ebalance implements entropy balancing, a multivariate reweighting method described in Hainmueller (2012) that allows users to reweight a dataset such that the covariate distributions in the reweighted data satisfy a set of specified moment conditions. This can be useful to create balanced samples in observational studies with a binary treatment where the control group data can...
متن کاملA Swarm Intelligence Approach in Undersampling Majority Class
Over the years, machine learning has been facing the issue of imbalance dataset. It occurs when the number of instances in one class significantly outnumbers the instances in the other class. This study investigates a new approach for balancing the dataset using a swarm intelligence technique, Stochastic Diffusion Search (SDS), to undersample the majority class on a direct marketing dataset. Th...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017